CAAD BLASTP: NCBI BLASTP Accelerated with FPGA-Based Pre-Filtering

نویسندگان

  • Jin H. Park
  • Yunfei Qiu
  • Martin C. Herbordt
چکیده

NCBI BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. The problem is that it uses complex heuristics which make it difficult to simultaneously achieve both substantial speed-up and exact agreement with the original output. Our approach is to prefilter the database. To make this work we have developed a novel heuristic which we append to a previously described structure for ungapped alignment. This enables us to quickly reduce the database by factors of 300 and 1100, for the ungapped and gapped options, respectively, while rejecting no significant sequences. On current hardware we anticipate a speed-up of at least a factor of 10 for NCBI BLASTP, independent of sensitivity settings. This filter is portable to other BLAST codes, and other filters can be similarly integrated into NCBI BLAST.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and analysis of an accelerated seed generation stage for BLASTP on the Mercury

NCBI BLASTP is a popular sequence analysis tool used to study the evolutionary relationship between two protein sequences. Protein databases continue to grow exponentially as entire genomes of organisms are sequenced, making sequence analysis a computationally demanding task. For example, a search of the E. coli. k12 proteome against the GenBank Non-Redundant database takes 36 hours on a standa...

متن کامل

Acceleration of Gapped Alignment in BLASTP Using the Mercury System

Protein databases have grown exponentially over the last decade. This exponential growth has made extracting valuable information from these databases increasingly time consuming. This project presents a new method of accelerating a commonly used program for performing similarity searching on protein databases, BLASTP. This project describes the design and implementation of Mercury BLASTP, a cu...

متن کامل

PSimScan: Algorithm and Utility for Fast Protein Similarity Search

In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner), a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a ...

متن کامل

BlastR—fast and accurate database searches for non-coding RNAs

We present and validate BlastR, a method for efficiently and accurately searching non-coding RNAs. Our approach relies on the comparison of di-nucleotides using BlosumR, a new log-odd substitution matrix. In order to use BlosumR for comparison, we recoded RNA sequences into protein-like sequences. We then showed that BlosumR can be used along with the BlastP algorithm in order to search non-cod...

متن کامل

Using Kleisli to Bring Out Features in BLASTP Results.

BLASTP gives a good overall indication of what function a protein might have. However, analysis of BLASTP reports to discover various domain features in the protein is still tedious. We address this problem by using the modern data integration system, Kleisli, to bring out annotated features of BLASTP results. We further strengthen our solution by incorporating additional information from SEG, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009